Comparing Models of Phonotactics for Word Segmentation
نویسندگان
چکیده
Developmental research indicates that infants use low-level statistical regularities, or phonotactics, to segment words from continuous speech. In this paper, we present a segmentation framework that enables the direct comparison of different phonotactic models for segmentation. We compare a model using phoneme transitional probabilities, which have been widely used in computational models, to syllable-based bigram models, which have played a prominent role in the developmental literature. We also introduce a novel estimation method, and compare it to other strategies for estimating the parameters of the phonotactic models from unsegmented data. The results show that syllable-based models outperform the phoneme models, specifically in the context of improved unsupervised parameter estimation. The syllablebased transitional probability model achieves a word token f-score of nearly 80%, the highest reported performance for a phonotactic segmentation model with no lexicon.
منابع مشابه
Improving Word Segmentation by Simultaneously Learning Phonotactics
The most accurate unsupervised word segmentation systems that are currently available (Brent, 1999; Venkataraman, 2001; Goldwater, 2007) use a simple unigram model of phonotactics. While this simplifies some of the calculations, it overlooks cues that infant language acquisition researchers have shown to be useful for segmentation (Mattys et al., 1999; Mattys and Jusczyk, 2001). Here we explore...
متن کاملRunning head: THE EFFECT OF SONORITY ON WORD SEGMENTATION The Effect of Sonority on Word Segmentation: Evidence for a Phonological Universal
It has been well documented that language specific cues—such as transitional probability (TP), stress and phonotactics—can be used for word segmentation. In our current work, we investigate what role a phonological universal, the sonority sequencing principle (SSP), may also play. Participants were presented with an unsegmented stream of speech from an artificial language with non-English onset...
متن کاملUse of Word Segmentation Cues in Adults: L1 Phonotactics versus L2 Transitional Probabilities
We investigate whether adult learners’ knowledge of phonotactic restrictions on word forms from their first language (L1) impact their word segmentation abilities in a new language. Adult learners were exposed to a speech stream in which language specific and non-language specific cues for word segmentation were pitted against one another. English rules about possible phonetic combinations (pho...
متن کاملDo we use L1 probabilistic phonotactics in L2 listening?
Abstract: The present study examined whether Cantonese-English bilingual listeners made use of their L1 probabilistic phonotactics in the segmentation process of English continuous speech (L2). Previous research in different languages demonstrated that probabilistic phonotactics could serve as a useful cue to locate the possible word boundary in continuous speech. The use of these kinds of info...
متن کاملPhonological constraints in speech segmentation processes: investigating levels of implementation
Recent data have been considered as evidence for a role of phonological constraints in speech segmentation processes. A distributional analysis of consonant sequences in the French lexicon shows that these results may be accounted for by lexical competition or by transitional probability models in which no claim is made about the usefulness of phonology in speech perception. Two word-spotting e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014